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Abstract: Streamflow forecasting in drylands is challenging. Data is scarce, catchments are highly human- 
modified and streamflow exhibits strong nonlinear responses to rainfall. The goal of this study was to 
evaluate the monthly and seasonal streamflow forecasting in two large catchments in the Jaguaribe River 
Basin in the Brazilian semi-arid area. We adopted four different lead times: one month ahead for monthly 
scale and two, three and four months ahead for seasonal scale. The gaps of the historic streamflow series 
were filled up by using rainfall-runoff modelling. Then, time series model techniques were applied, i.e., the 
locally constant, the locally averaged, the k-nearest-neighbours algorithm (k-NN) and the autoregressive 
model (AR). The criterion of reliability of the validation results is that the forecast is more skillful than 
streamflow climatology. Our approach outperformed the streamflow climatology for all monthly 
streamflows. On average, the former was 25% better than the latter. The seasonal streamflow forecasting 
(SSF) was also reliable (on average, 20% better than the climatology), failing slightly only for the high flow 
season of one catchment (6% worse than the climatology). Considering an uncertainty envelope 
(probabilistic forecasting), which was considerably narrower than the data standard deviation, the 
streamflow forecasting performance increased by about 50% at both scales. The forecast errors were mainly 
driven by the streamflow intra-seasonality at monthly scale, while they were by the forecast lead time at 
seasonal scale. The best-fit and worst-fit time series model were the k-NN approach and the AR model, 
respectively. The rainfall-runoff modelling outputs played an important role in improving streamflow 
forecasting for one streamgauge that showed 35% of data gaps. The developed data-driven approach is 
mathematical and computationally very simple, demands few resources to accomplish its operational 
implementation and is applicable to other dryland watersheds. Our findings may be part of drought 
forecasting systems and potentially help allocating water months in advance. Moreover, the developed 
strategy can serve as a baseline for more complex streamflow forecast systems. 
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1 Introduction 


Brazil is a land where water abounds. However, this resource is unevenly distributed throughout the 
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country's territory. A particularly drought-prone region is the northeast part of the country, which has 
been struck by over 100 severe droughts since the 16" century (Fioreze et al., 2012). A period of 
exceptionally strong droughts is currently affecting the region since 2012 with severe social and 
economic consequences. Despite the extreme environmental conditions, more than 25% of the 
Brazilian population live within the so-called "drought polygon" in northeastern Brazil (NE-Brazil) 
(Formiga-Johnsson and Kemper, 2005). 

Water management and planning in NE-Brazil has been centered on the storage of surface water 
by building dams (Araújo, 1990). In the drought polygon, groundwater resources are generally 
scarce, and so several thousands of reservoirs (mostly small dams) have been built in recent decades. 
An interstate water transfer system is also under construction (Nunes, 2012). In the last 30 a, the 
creation of sub-basin committees and user commissions has involved hundreds of stakeholders such 
as municipalities, public and large private irrigators, fishermen and industry leaders in the process of 
water allocation and conflict resolution. These experiences were very important transformations in 
water management practices and increased the demands for technical support, principally in the form 
of streamflow forecasts, which perfectly fall within a context of proactive drought risk management 
(Crochemore et al., 2016). 

Reliable seasonal streamflow forecast information is a key aspect of drought mitigation (Shukla 
and Lettenmaier, 2011), since the water allocation process for a given rainy season may start prior to 
its end, optimizing water releases to multiple competing users. Also, measures of water demand 
reduction can be applied more efficiently, avoiding abrupt water shortages. Despite the widespread 
recognition of the relevance and importance of seasonal streamflow forecasting (SSF) in the research 
community, forecasting of streamflow for drought management has not been applied widely 
(Trambauer et al., 2015). The forecast information must satisfy the need for a thorough drought 
assessment without overwhelming end users with high complexity (Seibert et al., 2017). 

Approaches for SSF predominantly fall into two categories: statistical or dynamic. The former 
frequently utilizes predictors of sea-surface temperature or a related index to directly estimate 
streamflow through regression techniques (Seibert et al., 2017; Delgado et al., 2018). The latter seeks 
to use numerical climate models linked with conceptual or physically based hydrological models 
through either an iterative (online) or static (offline) procedure strategy (Collischonn et al., 2007; 
Yossef et al., 2013; Yuan, 2016; Yuan et al., 2016). 

Souza Filho and Lall (2003) developed a semiparametric approach for forecasting inflows at 
reservoirs in the State of Ceara (Ceara) in NE-Brazil conditional on the NINO3 (the mean monthly 
temperature anomaly in the area of the eastern tropical pacific: 5S-5N; 150W—90W) index for the El 
Nifio Southern Oscillation (ENSO) and an equatorial Atlantic sea surface temperature index. 
Forecasts of January through December streamflow were made at three lead times: in January of the 
same year and in October and July of the preceding year. Large-scale climatic patterns have 
commonly been applied for improving long lead time streamflow forecasts (Moradkhani and Meier, 
2010; Kalra et al., 2012; Kalra et al., 2013). They found the streamflow at the Ceará sites is highly 
spatially correlated and is influenced by climate in a similar manner, leading to a common, 
underlying model for all sites. Although the correlation of the median forecast with the observed 
annual inflows of Orós reservoir, the second largest reservoir in Ceará, was consistently high (0.91) 
for the validation period (1993-2000), the dispersion of the forecasting ensemble was quite high too, 
reaching, for example, 65% of the reservoir capacity (difference between 75" and 25" percentiles of 
ensemble forecasts) in 2000. 

Delgado et al. (2018) assessed a set of seasonal drought forecast models for the Jaguaribe River in 
semi-arid NE-Brazil. The forecast issue time was January and the forecast period was January to 
June. Their work employed a cascade of models and algorithms ranging from two general circulation 
models (GCM) (one atmospheric and one coupled) at the top to hydrological indices at the bottom. 
Three statistical, meteorological downscaling approaches were applied to the GCM outputs. 
Reservoir volumes were obtained by fitting a multivariate linear regression based on forecast 
meteorological drought indices as predictors, such as Standardized Precipitation Index (SPI) and 
Standardized Precipitation Evapotranspiration Index (SPEJ) at different timescales. They found that 
low precipitation events showed either very low or no skill. Moreover, the good skill of the reservoir 
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storage forecast was likely related to the long memory of the reservoir system, because the forecasted 
precipitation will affect the reservoir only marginally, since most of its storage is accumulated 
throughout several years. In fact, no approach had a Root Mean Square Error (RMSE) that 
significantly departed from the RMSE of the climatology (Delgado et al., 2018), i.e., the observed 
long-term average of a variable. 

Pilz et al. (2019) evaluated and compared the performances of seasonal reservoir storage forecasts 
derived by a process-based, semi-distributed hydrological model and a statistical approach, which 
was developed by Delgado et al. (2018). They found, in a hindcast experiment (1981-2014), the 
accuracy of estimating regional reservoir storages was considerably lower using the hydrological 
model. In fact, investigations regarding the deficiencies of the process-based model revealed a 
significant influence of antecedent wetness conditions and a higher sensitivity of model prediction 
performance to rainfall forecast quality. 

Applying a framework of GCM, multiple regional climate models, including dynamical and 
statistical models, and two lumped water balance models, Block et al. (2009) produced ensemble 
streamflow forecasts for a hindcast of 1974-1996 monthly streamflow of the Jaguaribe River. The 
best coupled model demonstrated high skill scores for the correlation (0.90) and the RMSE (8.1*107 
m?) (based on the ensemble median), but performed inferiorly to climatology for the ranked 
probability skill score (Wilks, 2005). Afterwards, Kwon et al. (2012) showed the uncertainties 
associated with the climate forecast are much larger than those from parameter estimation in the 
assumed hydrological model. So, the past studies pointed out no dynamical, statistical or hybrid 
approach outperformed the climatology for the streamflow forecasting in the Jaguaribe River Basin. 

Poor or lower performance of forecasting systems for (very) dry catchments has also been reported 
by other authors. Robertson and Wang (2012) introduced a predictor selection method for a Bayesian 
joint probability approach to 3-month-ahead streamflow forecasting at multiple sites in two 
catchments in eastern Australia. They found out that the skill scores were considerably lower for the 
intermittent rivers in the Burdekin River catchment than the perennial ones in the Goulburn River 
catchment. In fact, in the Burdekin River catchment, the skill of the streamflow forecasts was close 
to zero for many months. Sittichok et al. (2016) combined statistical rainfall forecasting models with 
rainfall-runoff modelling in the Sahel region. They found out moderate skill (coefficient of 
determination equal to 0.55) for the monthly streamflow with a 12-month forecasting lead time in 
the Sirba watershed, West Africa. Seibert et al. (2017) applied multiple linear models, artificial neural 
networks and random forest regression trees to forecast seasonal hydrological drought (standardized 
streamflow index) in the Limpopo River Basin in southern Africa. The models were set up to predict 
the standardised total streamflow of December—May, which is one value per year, at the lead times 
of 1, 2, 3, 6, 9 and 12 months. At some streamflow stations, skill is present up to a 12-month lead 
time, but many stations (larger catchments in particular) only achieved little skill. These large 
catchments have not only a higher degree of human interference, but also drain large drylands in the 
Limpopo River Basin. Bennett et al. (2017) assessed a forecasting system based on a monthly 
rainfall-runoff model forced by ensemble rainfall forecasts for 63 Australian catchments, including 
21 ephemeral rivers. Although this system generally produced skillful forecasts at shorter lead times 
(<4 months), it did not perform well in very dry catchments, sometimes producing strongly negative 
forecast skill and poor reliability. Moreover, dry catchments are typically streamflow data-scarce 
environments, where the streamflow gauges are sparse, whose time series are normally short (from 
some years to few decades) and with a lot of gaps. Thus, overall, monthly streamflow forecasting 
and SSF in drylands is challenging. 

Although there are only a few examples in the literature, nonlinear univariate time series models 
have been shown as a promising tool to forecast semi-arid streamflow at daily (He et al., 2014) and 
monthly scales (Yassen et al., 2016). In this study, we propose to analyze the potentiality of the 
streamflow series itself for streamflow forecasting at monthly to seasonal time scales in two large 
catchments in the Jaguaribe River Basin, using nonlinear time series analysis. The specific objectives 
are to fill the gaps and increase the available streamflow data and to produce deterministic and 
probabilistic monthly streamflow forecasting for different lead times (one-, two-, three- and four- 
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months-ahead). The studied semi-arid catchments have a high streamflow interannual variability and 
relatively short time series with several gaps. A rainfall-runoff model was used to fill up the 
streamflow gaps. To our knowledge, this kind of hydrological model application for dryland 
streamflow forecasting has not been reported yet. Moreover, due to time series shortness, the chosen 
time series models were kept as simple as possible. 


2 Study area and streamflow data 


Ceará (Fig. 1) is the home to more than eight million people. The climate is predominantly semi- 
arid, covering more than 90% of its territory. The mean annual precipitation is about 810 mm, 
ranging from more than 1200 mm close to the coast, especially in some mountainous regions, to 
less than 600 mm in the large semi-arid landscape that extends from the coast to the interior borders 
(Werner and Gerstengarbe, 2003). The actual evapotranspiration is about 78% of the annual rainfall 
(SUDENE, 1980) while the potential evapotranspiration is four times the annual rainfall. The rainy 
season is mainly concentrated from February to May, accounting for about 70% of the annual 
rainfall. 
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Fig. 1 Location of the Brazilian semi-arid area, the State of Ceará (Ceara) and the Jaguaribe River Basin (a), with 
the main reservoirs and streamgauges (b) used in this study 


Interannual rainfall anomalies are driven primarily by anomalous patterns of Sea Surface 
Temperature (SST) variability in the Tropical Atlantic and the Equatorial Pacific (Hastenrath and 
Heller, 1977; Moura and Shukla; 1981, Uvo et al., 1998; Rodrigues et al., 2011). During the rainy 
season, the southernmost displacement of the Intertropical Convergence Zone in the Atlantic Ocean 
allows the enhancement of atmospheric conditions to precipitation events over or nearby Ceara. 
Interannual and seasonal fluctuations assigned to excess (lack) of rainfall over Ceará are associated 
with asymmetrical interhemispheric gradient of SST anomalies in the Tropical Atlantic oriented 
northward (southward) (Hastenrath and Heller, 1977; Moura and Shukla, 1981; Nobre and Shukla, 
1996). Also, on interannual timescales, accumulated rainfall in Ceara is strongly modulated by the 
ENSO, which is the prominent interannual mode associated to coupled oceanic-atmospheric 
interactions in the Equatorial Pacific (Philander, 1990; Rao and Hada, 1990). The warm (cold) 
oceanic phase of the ENSO is known as El Niño (La Niña) that is marked by an abnormal warmer 
(colder) SST anomalies in the eastern-central Pacific modifying the Walker circulation and leading 
to unfavourable (favourable) atmospheric conditions over Ceara. 

The streamflow is naturally ephemeral or intermittent. It ranges from 10% to 20% of annual 
rainfall and shows high temporal variability with a coefficient of variation generally above 1.00 at 
an interannual scale (Giintner and Bronstert, 2004). The large rivers are dominantly endogenous and 
interact with the underlying groundwater mainly by groundwater recharge (Costa et al., 2012a; 2013). 
Yet the groundwater resources are scarce and concentrated, occurring largely in sedimentary rocks 
on the state borders and on the coast, besides a sedimentary basin located in the middle of the state 
(Frischkorn et al., 2003). 

The recurrent droughts have been essentially treated as a supply problem to be resolved through 
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massive construction and related water infrastructure, such as dams and water transfer schemes 
among watersheds (Gutiérrez et al., 2014). As aresult, there are more than 7000 dams with a surface 
area larger than 5 hm? in Ceará (FUNCEME, 2008), which produce a dense reservoir network that 
highly impact the runoff connectivity at catchment scale and the river flow propagation (Giintner et 
al., 2004; Mamede et al., 2012). The state and federal agencies manage 155 dams, which store 
18.7x10° m3, providing more than 90% of the water supply in Ceara. The two largest reservoirs are 
the Castanhao and the Orós reservoirs with a capacity of 6.7x10° and 1.94x10° m°, respectively (Fig. 
1). The Upper Jaguaribe River and its main tributary, the Salgado River, are the principal runoff 
sources of the Jaguaribe River Basin. Their flows have been monitored by the Brazilian Geological 
Service at Salgado streamgauge (SS) and Iguatu streamgauge (IS) (Fig. 1), which drain areas of 
12,400 and 20,700 km?, respectively. This study concentrates on the river flows monitored at these 
streamgauges. Their daily streamflow data are made available by the Brazilian Water Agency 
(http://www.hidroweb.ana.gov.br). We consider a period of analysis from 1951 to 2015. Under this 
period, the IS shows few gaps, only 3.7% of the whole data (29 months). The gaps in the streamflow 
series of the SS represent 35.1% of the data (278 months). 


3 Methodology 


3.1 Time series models 


We assumed that the streamflow series is driven by a stochastic dynamical system, which governs 
the coupling between the climatological forcing and the human-modified catchment state that 
depends on natural landscape characteristics and anthropogenic effects, such as land use changes, 
reservoir construction and water withdrawal (Kirchner, 2009; Sivakumar and Singh, 2012; Costa et 
al., 2012b). A stochastic dynamical system can be expressed mathematically by time series models. 
The deterministic evolution operator (the dynamical or deterministic part) is approached by the 
dynamics of values of a unique variable (the delay embedding theorem; Takens, 1980), whereas the 
stochastic part by noise, which does not depend on the states of the dynamical part (Costa et al., 
2012b). In real-world open physical systems, such as catchment runoff, the presence of dynamical 
and data noise are inevitable (Porporato and Ridolfi, 2001; Kantz and Schreiber, 2004). Equation 1 
describes a stochastic dynamical system. 


Xtamat = F i Xiao ot ae + la J (1) 


where Xt+arxm is the forecast, Atxm is the lead time, m is a positive natural number, F is the 
deterministic term, n is the number of past steps and € could be white and coloured noise. Considering 
a training set, we can empirically determine the € distribution by assuming a regression model to F. 
Note that the distribution of € includes the uncertainty not only from the inherent fluctuations of the 
streamflow data, but also from the fitting of the underlying dynamical term by an adopted regression 
model (Costa et al., 2012b). So, the expected value of the forecast is equal to the deterministic term 
F, and the uncertainty involved is calculated by the distribution of € with a confidence interval, e.g., 
50%, ascribing in this way a probability density function (PDF) or an uncertainty envelope for X;+4+-m. 

We selected four regression models: (1) the locally constant (LC), whose prediction is just equal 
to the last streamflow measurement; (2) the locally averaged (LA), whose prediction is the average 
over the streamflow measurements at n last steps; (3) the traditional k-NN, whose prediction is up to 
the k number (ranging from 1 to 7 in this study) of neighbours and to the power parameter of the 
inverse distance weighting interpolation; and (4) the autoregressive model (AR), whose unknown 
parameters are the number of the streamflow measurements at n last steps and its coefficients. The 
latter was used for the dynamical term F as reference to test the hypothesis of linear random data. 
We did not apply an ARMA (autoregressive-moving-average model), because the noise inputs of the 
moving average model were not known before the application of Equation 1 and must be averaged 
(Kantz and Schreiber, 2004; Costa et al., 2012b). 


3.2 Gap-filling streamflow data 
We used the outcomes of the Soil Moisture Accounting Procedure (SMAP; Kwon et al., 2012), a 


JOURNAL OF ARID LAND 


rainfall-runoff model (Lopes et al., 1981) to fill up the gaps of the streamgauges and to increase 
their available streamflow data. The SMAP Model was set up and calibrated, producing a good 
performance for the monthly semiarid streamflow at IS and SS with a Nash-Sutcliffe coefficient 
of 0.78 and 0.86, respectively. 

We performed a model calibration under parameter uncertainty for 28 streamgauges, including 
the IS and SS, using the differential evolution adaptive metropolis (DREAM) algorithm (Vrugt et 
al., 2008; Vrugt et. al., 2009; Vrugt, 2016). In this study, we considered only the average over the 
simulated streamflow envelope, which was based on one thousand parameter vectors. They 
assumed a third of the available data for validation in each streamgauge (following the Split-Sample 
Test; Klemes, 1986). Wet, normal and dry years were well represented in both calibration and 
validation selected periods of the streamgauages, even if this selection varied from one to another. 
In the case of the IS (SS), the validation data was the streamflow measured from 1944 to 1976 
(from 1993 to 2007), while the remaining data from 1912 to 2017 (from 1973 to 2017) was used 
for model calibration. 


3.3 Time series model adjustment and assessment 


A cross-validation approach was adopted to evaluate streamflow forecasting. First, to facilitate the 
comparison of the model results between the studied basins, we had to choose the same validation 
period for both IS and SS. Second, the streamflow gaps, which were filled up with the rainfall- 
runoff modelling outputs, had to be concentrated in the training sets. Following these rules, three 
different intervals from the whole streamflow time series (1951-2015) were chosen for the training 
set and also for the validation set. The validation (training) sets had to show a good balance between 
dry, regular and wet streamflow seasons, because interannual variability of semi-arid streamflow 
is very high (Giintner and Bronstert, 2004). Moreover, it is recurrent dry and wet decades in the 
Brazilian semi-arid area. Therefore, we selected the period combinations for training and validation 
sets as shown in Table 1. 


Table 1 Period combinations for training and validation sets, which were used for the applied cross-validation 
approach, given 65-a (1951-2015) streamflow time series at both IS and SS 


Combination Training set Validation set 
I 1951-1979; 1990-2015 1980-1989 
I 1951-1989; 2000-2015 1990-1999 
Ill 1951-1999; 2010-2015 2000-2009 


The model performance was calculated for each training set and over all validation sets together 
(30 a in total). There was only one gap in the validation set for both streamgauges, while the 
remaining gaps, which were filled up with the rainfall-runoff modelling outputs, were concentrated 
in the training sets. This cross-validation approach is similar to the classical k-folds cross-validation 
technique, but restricted to a validation period (1980-2009), whose data showed almost no gaps 
for both streamgauges. Considering that the length of the streamflow time series is short (only 65 
a) and much of them were actually the outputs of a rainfall-runoff model, we believe there is a good 
balance between the length of the validation set and the length of the training set. 

LC needed no model adjustment, whereas for LA we adopted the measurements at 2 last steps, 
which gave the highest correlations with the predictand in the training set (not shown). We tried 
from 1 to 7 nearest neighbours for k-NN and adjusted the power parameter for all k-NN models 
that maximize the model performance over the training sets. Application of k-NN models for 
streamflow time series can be found in many literature (Wu et al., 2009; Wu and Chau, 2010; 
Tongal, 2020). AR was fitted after the method of least squares for each training set (1, II and III in 
Table 1). Then, each fitted AR model was used for each validation subset (10 a), respectively. 

We assumed that the criterion of acceptance (reliability) of the validation results is the model 
forecast more skillful than the streamflow climatology. The climatology as a benchmark is 
adequate, since water management in the studied basins is typically relying on it instead of seasonal 


Alexandre C COSTA et al.: Monthly and seasonal streamflow forecasting of large dryland... 


forecasting as a basis for decision making, which was also assumed by Seibert et al. (2017). The 
RMSE is the standard deviation (SD) of the residuals (forecast errors). So, a higher (lower) RMSE 
means a lower (higer) model performance. To facilitate the comparison of forecast reliability 
between the models and the streamflow climatology, we used the standardized root mean square 
error (SRMSE) as the performance criterion. SRMSE is defined as RMSE divided by the SD of the 
test set (Perreti et al., 2013). SRMSE greater than unity indicates predictions less accurate than 
simply predicting the mean of the test set (Perreti et al., 2013). In this study, we defined the mean 
of the test set (training or validation) as streamflow climatology. So, according to Yossef et al. 
(2013), when SRMSE is equal to 1.00, the forecast skill is equal to that of a climatological forecast. 
SRMSE smaller than 1.00 indicates that it is more skillful than climatology; whereas SRMSE 
greater than 1.00 indicates less skill than climatology. SRMSE approaches 0.00 for a perfect 
forecast. 

For the assessment of the deterministic forecasting of the streamflow, the error was simply 
calculated applying Equation 1 for training and validation sets. Considering the probabilistic 
streamflow forecasting, in which we assigned a PDF with a confidence interval for the forecast 
using the model errors in the training set, the forecast errors in the validation set (ev) were calculated 
in a different manner (Eq. 2). 


a. % 
Ximat T F +6 ’ if Xi mAt 
_F+e, ifx 


0, otherwise 


>F+& 
<Fte, (2) 


E, = Xia mAt t+mAt 


where &* (&) is the upper (lower) limit of the confidence interval and F is the deterministic term. 
So, when a measurement in the validation set occurred inside the limits of the confidence intervals 
of the forecast PDF, the error was zero. These errors of Equation 2 are less than those from the 
application of the deterministic term alone. However, if the length of the chosen confidence interval 
is too large, the probabilistic forecast may not be useful for real-world issues, even providing a very 
low SRMSE. Therefore, in the context of probabilistic forecasting, a compromise must be given 
between the length of the confidence interval and the result SRMSE. In this work, we compared 
the length of the confidence interval to the SD of the streamflow series. The necessary box-and- 
whisker plots were generated using the web-tool BoxPlotR (http://shiny.chemgrid.org/boxplotr/). 


3.4 Jaguaribe River Basin application 


The Jaguaribe River flows normally from January to June at IS and SS. Rarely, river flows are 
relevant in December and July next year. Since past streamflow measurements are necessary for 
this approach, we chose the streamflow in March, April, May and June (the middle and end of the 
rainy season) as predicands at seasonal and monthly scale. We adopted four different lead times: 
one month ahead for monthly scale and two, three and four months ahead for seasonal scale. 

The streamflows in January and February (the beginning of the rainy season) were always 
predictors at monthly and seasonal scale. However, the number of predictors increases depending 
on the lead time and the predicand at monthly scale (Table 2). Table 2 shows the 7 forecasts run in 
this study after the combination of predicands, predictors and lead times. The predicands and 
predictors in Table 2 are indicated by the month, in which the streamflow is observed. So, we run 
a monthly streamflow forecasting with a fixed lead time of one month and a variable number of 
predicands. We also run a SSF with a variable lead time (two, three and four months) and a fixed 
number of predictors, which were the streamflows in January and February. The streamflow 
forecasting began (finished) with the predicand in March (June). Although streamflows, which are 
observed in January and February, show smaller variances (Figs. 2 and 3), they should be taken 
into account as predictors, because they demonstrated a relevant prediction power at both scales. 

An additional difficulty is the dryness at the beginning of the rainy season, because non-flow 
states in January and February clearly hamper the streamflow forecasting for the remaining season 
from March to June, using the aforementioned time series models. Therefore, in the case of non- 
flow states for the predictors, we added a new assumption for the k-NN approach: the nearest 
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neighbours to the predicand were those also closer in time (the hydrologic persistence; 
Koutsoyiannis, 2005), besides the Euclidean distance of the traditional k-NN approach. 

The construction of reservoirs, land use changes and irrigation schemes, which interact with each 
other in the study catchments, over the last century may hamper the streamflow predictability. 
However, there are no shift terms or clear trends in the streamflow time series, which are expected 
when such nonstationarities are relevant. Therefore, we consider them of minor importance at such 
a large spatial scale, when compared to data uncertainty and natural streamflow variability, which 
are quite high for dryland rivers. This assumption allows a priori the applicability of the presented 
methodology. 


Table 2 Developed streamflow forecasting in the Jaguaribe River Basin after the combination of predicands, 
predictors and lead times. 


Predicand Predictor 
March Jan, Feb - - - 
April Jan, Feb, Mar Jan, Feb - - 
May Jan, Feb, Mar, Apr - Jan, Feb - 
June Jan, Feb, Mar, Apr, May - - Jan, Feb 
Lead time (month) 1 2 3 4 


Note: The predicands and predictors are indicated by the month, in which the streamflow is observed. -, not applicable. 
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Fig. 2 Average monthly hydrograph (1951-2015) of the Iguatu streamgauge (IS), with 3.7% of gap-filled 
streamflow data (29 months) using a rainfall-runoff model 
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Fig. 3 Average monthly hydrograph (1951-2015) of the Salgado streamgauge (SS), with 35.1% of gap-filled 
streamflow data (278 months) using a rainfall-runoff model 
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4 Results 


4.1 Gap-filled hydrographs 


The average monthly hydrographs of the IS and SS are presented in Figures 2 and 3, respectively, 
which showed that the high streamflow season occurred only in March, April and May, while 
negligible river discharge was observed from July to December, except for some discharge outliers 
at the IS in December and at SS in December and July next year. There were two periods of 
transition between high and very low streamflow states: the beginning and the end of the 
streamflow season, which happened in January and February, and in June, respectively. The 
streamflow in the beginning of the streamflow season was larger than that in the end of the 
streamflow season. 

The rising hydrograph limb took four months from January to April, when the streamflow peak 
occurred, while the falling limb took just two months (May and June). The streamflow in March 
prior to the peak was higher than that in May after the peak. In the rising limb, the faster streamflow 
change happened from February to March, i.e., during the transition between the beginning of the 
season and the high streamflow season. In the falling limb, the faster change occurred from April 
(peak flow) to May. 

During the high streamflow season, we found the larger streamflows took place in IS, when 
compared to the streamflows in SS. The beginning and the end of the streamflow season were 
relatively drier in IS. In the beginning (end) of the streamflow season, we observed a non-flow state 
in 16 (20) out of 65 a at the IS, whereas the non-flow state in the beginning (end) of the streamflow 
season happened only once (four times) at the SS. So, the hydrograph was sharper for IS and, for 
both streamgauges, the end of the season was drier than the other seasons. 


4.2 Monthly streamflow forecasting 


4.2.1 Deterministic forecasting 


The performance of the time series models (k-NN, LC, LA and AR) for each predicand in the 
validation set is presented in Figure 4. We excluded some outliers, with SRMSE (forecast error) 
greater than 2.00. Then, we did not show one outlier (AR) for March and April at IS and two outliers 
(LC and LA) for June at both streamgauges. 


16-5 


— 
Mar IS Mar SS Apr IS Apr SS May IS May_SS Jun IS Jun SS 
Month _Streamgauge 


Fig.4 SRMSE (standardized root mean square error) of the applied time series models in the validation set (1980— 
2010) for the monthly streamflow forecasting at the IS and SS 


Most of the models (73%) performed better than the mean prediction (streamflow climatology), 
because they showed SRMSE less than 1.00. Their mean SRMSE was 0.80, which was 20% better 
than the climatology. All months for both streamgauges had at least six models, whose forecast was 
better than the mean prediction. The models, which showed all SRMSEs less than 1.00, were 3- 
NN, 4-NN, 6-NN and 7-NN. The 2-NN and 5-NN models showed just one SRMSE value higher 
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than 1.00. AR had the worst performance with 6 out of the 8 SRMSEs higher than 1.00, followed 
by LC (5) (LC had 5 out of 8 SRMSEs higher than 1.00) and LA (4) (LA had 4 out of 8 SRMSEs 
higher than 1.00), respectively. 

Considering the best-fit model of each month in the validation set (Table 3), 7 out of the 8 models 
were a kind of k-NN approach. Only the streamflow in April (peak flow) at IS was better forecast 
with the LC model, the simplest approach. According to Table 3, SRMSE varied from 0.65 in June 
(35% better than the mean prediction) to 0.90 in April (10% better than the climatology) at IS. For 
SS, the SRMSE varied from 0.67 in March (33% better than the mean prediction) to 0.86 in April 
(14% better than climatology). 


Table 3 SRMSE (standardized root mean square error) of the best-fit models in the validation set for the monthly 
streamflow forecasting at the Iguatu streamgauge (IS) and Salgado streamgauge (SS) 


Mode Mar_IS Mar SS AprIS Apr SS  May_IS. May SS Jus Jun_SS 
performance 
Best-fit model 3-NN 4-NN 7-NN LC 2-NN 2-NN 5-NN 7-NN 
SRMSE 0.71 0.67 0.90 0.86 0.72 0.77 0.65 0.71 
Validation 
SRMSE Training 0.93 0.72 0.89 0.76 0.89 0.80 0.73 0.83 


Note: LC, locally constant approach; k-NN, k-nearest-neighbours algorithm with k ranging from 1 to 7. SRMSE is the RMSE (root mean 
square error) divided by the SD (standard deviation) of the test set (validation or training). 


In the rising limb (March and April), the SRMSE in March was less than that in April, when 
SRMSE was also at the highest, for both streamgauges (Fig. 4). In the falling limb (May and June), 
a higher SRMSE in May was also the case for both streamgauges. In the high streamflow season 
(March, April and May), the streamflow was better forecast in March at SS and slightly better in 
May at IS. Considering the best-fit model of each month in the validation set (Fig. 4; Table 3), we 
found out the SRMSE in the rising limb at IS was higher than that at SS. However, SRMSE in the 
falling limb at IS was less than that at SS. 

Comparing SRMSE between validation and training sets (Table 3), it was approximately the 
same for 3 out of the 8 predicands (absolute difference less than 0.03) and higher in the validation 
set for one predicand (the streamflow in April at SS). However, half of the predicands showed a 
higher SRMSE in the training set, which were found out for the streamflow forecast in March and 
June at the both streamgauges. 

To illustrate the deterministic streamflow forecasting at monthly scale, we present the forecasts 
of the best-fit time series model (4-NN) in March at SS (Fig. 5). In the validation set, the model 
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Fig.5 Monthly deterministic forecasting in the validation set. The predicand is the streamflow in March at the SS. 
The predictors are the streamflow in January and February. The time series model, which is the best-fit one, is 4-NN. 
SRMSE (forecast error) in the validation set is 0.67, which means the 4-NN model is 33% better than the mean 
predictor (climatology). Here, SRMSE is the rms error divided by the SD (standard deviation) of the validation set. 
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was able to represent well the high, near-average and low flow states, but underestimated the largest 
peak. Also, the transition between these states (interannual variability) was well simulated, but with 
some delay in the first decade (1980—1989) of the test set. Overall, the dominant periods of high 
and low flows were very well differentiated, especially the latter ones. 

4.2.2 Probabilistic forecasting 

The best-fit time series models (Table 3) were assumed as dynamical parts of Equation 1. Then, we 
were able to calculate the stochastic term, using the errors in the training set. Table 4a and b show 
a summary of the forecast PDFs (probability density functions) for each predicand (streamflow in 
March, April, May and June) at both streamgauges, providing the length of the adopted confidence 
intervals (33%, 50% and 66%, respectively), SRMSE and the SD of the gap-filled streamflow data. 


Table 4a Monthly probabilistic forecasting of the validation set at the IS 
March (92)* April (204)* May (79)" June (12)* 


Confidence interval 
Length SRMSE Length SRMSE Length SRMSE Length SRMSE 


33% 39 0.54 49 0.88 12 0.70 2 0.62 
50% 72 0.43 93 0.84 31 0.66 3 0.59 
66% 128 0.26 195 0.78 55 0.62 6 0.53 


Note: The deterministic term of the probabilistic forecasting is based on the best-fit time series model in the validation set and the 
stochastic term is based on the errors in the training set. *, the numbers in the brackets are the SD (m/s) of the gap-filled streamflow 
series. 


Table 4b Monthly probabilistic forecasting of the validation set at the SS 


March (85)* April (123)" May (68)* June (14)* 
Confidence interval 2. NNN. —— axxx 
Length SRMSE Length  SRMSE Length SRMSE Length SRMSE 


33% 26 0.58 32 0.78 12 0.73 4 0.66 
50% 60 0.47 58 0.72 31 0.68 7 0.64 
66% 89 0.40 101 0.67 63 0.60 13 0.59 


Considering a narrow uncertainty envelope over the deterministic forecasting, i.e., a length of 
the confidence interval (33%) that was smaller than SD of the streamflow data, the forecasting 
performance improved expressively for all predicands at SS and for the streamflow in March at IS. 
Comparing Table 3 and Table 4a and b, SRMSE was reduced by 0.11, 0.08, 0.04 and 0.05 for the 
streamflow in March, April, May and June at SS, respectively. The highest reduction of SRMSE 
(0.17) was found for the streamflow in March at IS. The probabilistic forecasting at the narrowest 
uncertainty envelope (Table 4a and b) outperformed the mean predictor (streamflow climatology) 
as 46%, 42%, 22%, 27% and 34% for the streamflow in March (IS), March (SS), April (Salgado), 
May (SS) and June (SS), respectively. 

For the other predicands at IS (streamflow in April, May and June), the improvement in the 
forecasting performance was marginal with a SRMSE reduction of 0.02. Only a wider uncertainty 
envelope (50% or 66% of confidence interval) led to a better forecasting performance for these 
predicands. For the streamflow in June, a 66% confidence interval of the in-training-set error 
distribution, which corresponded to a length of 6 m?/s (or 50% of streamflow data SD), produced 
a significant SRMSE reduction of 0.12 with the probabilistic forecasting performing 47% better 
than the climatology. The same level of SRMSE reduction for the streamflow in April (0.12) and 
May (0.10) was also achieved at 66% of confidence interval, but showing a much larger length, 
96% and 70% of SD for the streamflow in April and May, respectively. The IS probabilistic 
forecasting for the streamflow in April (May) at the narrowest uncertainty envelope (33%) (Table 
4a) outperformed 12% (30%) the mean predictor. 

The reason for that slight performance increase even with a larger length of the confidence 
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interval was mainly two underestimated peaks for the streamflow in April and May. These peaks 
occurred in 1985 and 1989, which were very moist years. To illustrate the probabilistic forecasting 
at monthly scale, highlighting the aforementioned difficulty as well, we present the forecasts of the 
best-fit time series model (2-NN) enveloped by a PDF based on 50% confidence interval of the in- 
training-set error distribution (Fig. 6). The predicand was the streamflow in May at IS. In the 
validation set, most of the streamflow data was close to or inside the bounds of the forecast 
envelope. Although the states of the flows (high, near-average and low) were very well represented, 
the very high flows in 1985 and 1989 were considerably underestimated. Moreover, there was a 
relevant overestimation in 1981. As shown in Figure 5, the forecasting in the first decade (1980- 
1989) of the test set was less reliable compared to the other two decades. 
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Fig. 6 Monthly probabilistic forecasting in the validation set (30 a) with a confidence interval of 50%, which 
defines the upper and lower bound forecast. The predicand is the streamflow in May at the IS. The predictors are 
the streamflow in January, February, March and April. The time series model is 2-NN. SRMSE in the validation set 
is 0.66, which means the stochastic approach is comparatively 34% better than the mean predictor (climatology). 


4.3 SSF 
4.3.1 Deterministic forecasting 


The performance of the time series models (k-NN, LC, LA and AR) for each predicand in the 
validation set is presented in Figure 7. We excluded some outliers, whose SRMSE was higher than 
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Fig. 7 SRMSE of the applied time series models in the validation set (1980-2010) for the seasonal streamflow 
forecasting (SSF) in April (Apr), May and June (Jun) at the IS and SS 
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2.00. Then, we did not show one outlier (AR) for April and May at IS and three outliers (LC, LA 
and AR) for June at both streamgauges. Note that the SSF in March is not shown, because it is the 
same as the monthly streamflow forecasting, since the predictors (January and February) are the 
same. 

Most of the models (55%) did not perform better than the mean prediction (streamflow 
climatology), because they showed SRMSE higher than 1.00. The streamflow forecasting in April 
and May at SS did not present any time series model, whose forecast was better than the mean 
prediction. However, for the remaining predicands, there were at least 4 models that showed 
SRMSEs less than 1.00. The 4-NN, 5-NN, 6-NN and 7-NN models had the best performance with 
6 out of the 8 SRMSEs less than 1.00. The 3-NN model showed four SRMSE values less than 1.00. 
AR had the worst performance with the all SRMSEs higher than 1.00, followed by LA, LC, 1-NN 
and 2-NN with 6 out of the 8 SRMSEs higher than 1.00. 

Considering the best-fit model of each month in the validation set (Table 5), almost all models 
were a kind of k-NN approach. Only the streamflow in May at IS was better forecast with the LA 
model. According to Table 5, SRMSE varied from 0.78 in April (22% better than the mean 
prediction) to 0.87 in June (13% better than the climatology) at IS. For SS, the SRMSEvaried from 
0.94 in June (only 6% better than the mean prediction) to 1.06 in May (6% worse than climatology). 
As expected, the performance of the SSF decreased with the increasing of the lead time at IS: March 
(SRMSE, 0.71), April (SRMSE, 0.78), May (SRMSE, 0.80) and June (SRMSE, 0.87). SSF 
performance showed a different behaviour with the increasing of the lead time at SS: March (0.67), 
April (1.05), May (1.06) and June (0.94). Besides the streamflow in March, the seasonal predicands 
were better forecast for IS. 


Table5 SRMSE of the best-fit models in the validation set for the seasonal streamflow forecasting (SSF) in April, 
May and June at the IS and SS 


Model performance Apr_IS Apr_SS May_IS May_SS Jun_IS Jun_SS 
Best-fit model 6-NN 7-NN LA 7-NN 7-NN 6-NN 
SRMSE validation 0.78 1.05 0.80 1.06 0.87 0.94 
SRMSE trainning 0.95 1.03 1.60 1.03 1.04 1.00 


Note: SSF in March is not shown, because it is the same as the monthly streamflow forecasting, since the predictors (January and February) 
are the same. 


Comparing SRMSE between validation and training sets (Table 5), it was slightly the same for 
2 out of the 6 predicands (absolute difference less than 0.03), which were found out for the 
streamflow forecast in April and May at SS. The remaining predicands showed a higher SRMSE 
in the training set. 

To illustrate the deterministic streamflow forecasting at seasonal scale, we present the forecasts 
of the best-fit time series model (6-NN) in April at IS as example (Fig. 8). In the validation set, the 
model clearly overestimated the streamflow from 1990 to the middle of 2000s and underestimated 
it in the beginning of 1980s and the end of 2000s. Although over- and underestimating, the model 
was capable to differentiate well the long periods of high and low flows, including a very well 
simulation of the largest peak. 


4.3.2 Probabilistic forecasting 


The best-fit time series models (Table 5) were assumed as dynamical parts of Equation 1. Then, we 
were able to calculate the stochastic term, using the errors in the training set. Table 6a and b show 
a summary of the forecast PDFs for each predicand (streamflow in April, May and June) at both 
streamgauges, providing the length of the adopted confidence intervals (33%, 50% and 66%), 
SRMSE and the SD of the gap-filled streamflow data. Note that the SSF in March is not shown, 
because it is the same as the monthly streamflow forecasting, since the predictors (January and 
February) are the same. 
Considering a narrow uncertainty envelope over the deterministic forecasting, i.e., a length of 
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Fig. 8 Seasonal deterministic forecasting in the validation set (30 a). The predicand is the streamflow in April at 
the IS. The predictors are the streamflow in January and February. The time series model, which is the best-fit one, 
is 6-NN. SRMSE in the validation set is 0.78, which means the 6-NN model is 22% better than the mean predictor 
(climatology). 


Table 6a Seasonal probabilistic forecasting of the validation set at the IS 


April (204)" May (79)* June (12)" 
Confidence interval 
Length SRMSE Length SRMSE Length SRMSE 
33% 81 0.67 40 0.56 5 0.78 
50% 139 0.60 73 0.46 10 0.71 
66% 235 0.48 98 0.38 16 0.60 


Note: The deterministic term of the probabilistic forecasting is based on the best-fit time series model in the validation set and the 
stochastic term is based on the errors in the training set. SSF in March is not shown, because it is the same as the monthly streamflow 
forecasting, since the predictors (January and February) are the same. *, the numbers in the brackets are the SD (m*/s) of the gap-filled 
streamflow series. 


Table 6b Seasonal probabilistic forecasting of the validation set at the SS 


April (123)" May (68)* June (14)* 
Confidence interval l 
Length SRMSE Length SRMSE Length SRMSE 
33% 54 0.95 24 0.93 5 0.86 
50% 110 0.80 37 0.89 11 0.78 
66% 164 0.67 58 0.83 18 0.71 


the confidence interval (33%) that was smaller than SD of the streamflow data, the forecasting 
performance improved expressively for all predicands at both streamgauges. Comparing Table 5 to 
Tables 6a and b, SRMSE was reduced by 0.11 (0.10), 0.24 (0.13) and 0.09 (0.08) for the streamflow 
in April, May and June at IS (SS), respectively. The highest reduction of SRMSE was found for the 
streamflow in May, followed by those in April and June, respectively. The probabilistic forecasting 
at the narrowest uncertainty envelope (Table 6a and b) outperformed the mean predictor 
(climatology) as follow: 5% (33%), 7% (44%) and 14% (22%) for the streamflow in April, May 
and June at SS (IS), respectively, showing a much better streamflow forecasting at IS. 


5 Discussion 


The developed data-driven approach outperformed the streamflow climatology (the mean predictor 
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in the validation set) for all one-month-ahead streamflows monthly streamflows (8 predicands). On 
average, it was 25% better than the climatology. SSF was also reliable (on average, 20% better than 
the climatology), failing slightly only for the high flow season (April and May) of the SS (6% worse 
than the climatology). Considering an uncertainty envelope, which was 50% and 40% of the data 
SD, the streamflow forecasting performance increased considerably to 37% and 28% better than 
the climatology at monthly and seasonal scale, respectively. These findings are an advance 
regarding the past studies on this subject in the Jaguaribe River Basin (Delgado et al., 2018; Pilz et 
al., 2019), which were not able to perform better than the streamflow forecasting, and are at the 
same order of magnitude of studies carried out in other drylands, but using a more complex 
approach (Yossef et al., 2013; Seibert et al., 2017). 

The global nonlinear k-NN approach was the best-fit time series model for 12 out of 14 
predicands, although the k number of nearest neighbours and the power parameter of the inverse 
distance weighting interpolation were not the same. The LC and LA approach were the best-fit 
model for the monthly-scale streamflow in April at SS and the seasonal-scale streamflow in May 
at IS, respectively. The global linear AR model was the worst-fit time series model for both scales. 
In the Limpopo River Basin in southern Africa, Seibert et al. (2017) found a different result for a 
multivariate analysis of the hydrological drought forecasting, in which a multiple linear regression 
showed the best forecast skill compared to artificial neural networks and random forest regression 
trees, despite their capabilities to represent nonlinear relationships. 

The outputs of the rainfall-runoff model to fill up the gaps of the streamflow series were 
fundamental to a better streamflow forecasting at SS (modelled data were 35.1% of the whole 
streamflow series), while their impact on the model performance was negligible at IS (modelled 
data were only 3.7% of the whole streamflow series). Disregarding the modelled data at SS, for 
example, the SRMSE of the best-fit time series model increased from 0.67 to 0.75, 0.77 to 0.82 and 
0.71 to 0.79 for the monthly-scale streamflow in March, May and June, respectively. There was no 
performance change for the streamflow forecasting in April, because the best-fit model was the LC 
that did not need any information from the training set, whose concentrated modelled data. This 
finding may create new opportunities for streamflow forecasting at monthly and seasonal scale in 
data scarce regions, leading to positive impacts on water allocation and drought management. 

SRMSEs were mainly driven by the streamflow intra-seasonality at monthly scale, while they 
were by the forecast lead time at seasonal scale. In the context of streamflow forecasting, the latter 
is a recurrent result and was also observed in other studies, which dealt with the role of the 
antecedent streamflow or the initial conditions for streamflow forecasting (Yossef et al., 2013; 
Trambauer et al., 2015). 

The monthly-scale streamflow forecasting performed better than the seasonal-scale one, besides 
the streamflow in April at IS. The monthly-scale rising limb was better forecast for SS, while the 
monthly-scale falling limb for IS. The former may be explained by the hydrograph sharpness at IS, 
due to the combination of a drier beginning of the streamflow season and a larger high streamflow 
season. The latter may be a result of a much more complex baseflow behavior at SS, due to the 
discharge from and the interaction between the different regional aquifers located in the upstream 
basin of the Salgado River (Machado et al., 2007). 

The seasonal-scale streamflow forecasting from April to June was a much better forecast for IS 
than SS. This outstanding result of the seasonal-scale streamflow forecasting at IS may be explained 
by the use of the hydrologic persistence assumption (Koutsoyiannis, 2005), which was quite often 
presented for IS. This assumption worked always when the beginning of the streamflow season 
(January and February) showed non-flow, which was the case for 12 out of 30 a in the validation 
set of each seasonal-scale predicand (April, May and June) at IS. 

The probabilistic strategy improved significantly the streamflow forecasting at a narrow 
uncertainty envelope, although a larger envelope length was found at seasonal scale. This strategy 
had a better performance in the transition between the beginning of the season and the high flow 
season (March) at monthly scale and in the transition between the high flow season and the end of 
the season (May) at seasonal scale. Some of the predicands showed higher SRMSEs in the training 
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set, which may be explained by the concentration of modelled streamflow data in this set and/or a 
more complex streamflow series of the training set. However, since higher SRMSEs in the training 
set led to a larger uncertainty envelope length in the validation set, that possible side effect was 
taken into account for the construction of the probabilistic forecasting, which presented reliable 
results even at the narrowest uncertainty envelope. 

Monthly and seasonal forecasts have a high uncertainty; therefore, it is an important task to 
convey the skill of the forecast system, and for end users to include uncertainty information in the 
decision-making process (Seibert et al., 2017). This is achieved by probabilistic streamflow 
forecasts that may serve as inputs to water resources operation models, which provide the 
probability of net benefits or losses for each sector of the economy, such as irrigation, industry and 
fish farming. If the probability of loss is significant, policymakers may want to divert some 
resources in advance to mitigation procedures, or at least be prepared for a significant emergency 
intervention (Sittichok et al., 2016). However, deterministic forecasts are still common practice in 
many water management systems in Brazil. 


6 Conclusion 


We developed a data-driven approach to forecast dryland streamflows at monthly and seasonal 
scale, relying only on the streamflow series itself. The deterministic forecasting was evaluated by 
the application of four different types of time series model (LC, LA, k-NN and AR). The 
probabilistic forecasting was based on the deterministic forecast enveloped by a PDF from the time 
series model errors in the training set. The outputs of a conceptual rainfall-runoff model were used 
to fill up the gaps of the streamgauges and to increase the available streamflow data. To our 
knowledge, this kind of hydrological model application for dryland streamflow forecasting has not 
been reported yet. This methodology was applied for two large catchments in the Brazilian semi- 
arid area. 

Our approach outperformed the climatology for most streamflows at monthly and seasonal scale 
(12 out of 14 predicands), in which the global nonlinear model and the global linear model were 
the best-fit and worst-fit time series models, respectively. In the probabilistic strategy, the 
deterministic forecast enveloped by a PDF, which was considerably narrower than the data SD, 
increased the forecasting performance by about 50% at both scales. The outputs of the rainfall- 
runoff model to fill up the gaps of the streamflow series played an important role in improving 
streamflow forecasting. 

The developed data-driven approach is mathematical and computationally very simple, demands 
few resources to accomplish its operational implementation and is applicable to other dryland 
watersheds. Moreover, since the studied watersheds have characteristics (e.g., large drainage area, 
short time series with gaps and high streamflow interannual variability) that are similar to those of 
drylands in need of streamflow forecasting information, we believe that the transfer potentiality of 
this study is high. The streamflow forecasts may be part of drought forecasting systems and help 
plan water allocation months in advance. Moreover, the developed strategy can serve as a baseline 
for more complex streamflow forecast systems. 
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